The Best of MacTutor - Source Code for Volumes 1 to 5

home *** CD-ROM | disk | FTP | other *** search

/ The Best of MacTutor - S…e Code for Volumes 1 to 5 / The Best of MacTutor - Source Code for Volume 1-5 (Wayzata Technology)(6031)(1990).bin / Source Code / #05 (Dec85-Jan86) / pascal / Mondrian & dissolve source 1-13 / dissbits.asm next >

Wrap

Assembly Source File | 1990-01-01 | 43KB | 981 lines

; ; procedure dissBits (srcB, destB: bitMap; srcR, dstR: rect); external; ; ; mike morton ; release: 15 september 1985 ; ; this is the version 5.2. ; ; differences from version 5 to 5.2 are: ; bug fixed in the full-screen-width case (first bit done right) ; magic-constant table is compacted, stored in bytes instead of longs ; some small changes to save space, plus the table saves about 100 bytes ; tested by a program on several million cases; this helped find... ; yet more bugs in log2 routine fixed ("bitwidth" routine added) ; differences from version 4 are: ; nasty bug fixed in copying small rectangles ; new address for correspondence ; slightly more detailed notes on calling from C ; various miscellaneous comment changes ; differences from version 3 are: ; bugs in the log2 routine fixed ; general case sped up about five percent ; certain cases (e.g., the full screen) sped up over fifty percent ; the time to dissolve is not directly related to the size of the rect ; differences between version 2 and version 3 are: ; documentation improved and neatened ; log2 routine rewritten ; ; comments and suggestions are, of course, welcome. ; ; ****************************************************************************** ; * * ; * copyright 1984, 1985 by michael s. morton * ; * permission to publish and distribute granted to MacTutor * ; * please see details below on using, copying and changing this source. * ; * * ; ****************************************************************************** ; ; what this routine does: ; ---------------------- ; ; dissBits is like copyBits: it moves one rect to another, in their respective ; bitMaps. it doesn't implement the modes of copyBits, nor clipping to a ; region. what it DOES do is copy the bits in a pseudo-random order, giving ; the appearance of "dissolving" from one image to another. the dissolve is ; rapid: the entire screen will dissolve in under four seconds. (note: smaller ; areas may be SLOWER to dissolve -- see below.) ; ; copyBits pays attention to the current clipping. this routine doesn't. ; ; other likely differences from copyBits: ; o the rectangles must have the same extents (not necessarily the same ; lrbt). if they are not, the routine will return -- doing nothing! ; no stretching copy is done as copyBits would. ; o the cursor is hidden during the dissolve, since drawing is done without ; quickdraw calls. the cursor reappears when the drawing is finished. ; for an odd effect, change it not to hide the cursor; is this how bill ; atkinson thought of the spray can in MacPaint? ; o copyBits may be smart enough to deal with overlapping areas of memory. ; this routine certainly isn't. ; o because this routine is desperate for speed, it steals the A5 (globals) ; register, uses it in the central loop, then restores it before ; returning. if you have vertical retrace tasks which run during the ; dissolve, they'll wake up with the bogus A5. since the ROM pulls this ; trick too, i feel this isn't a bug in MY code, but it IS more likely ; to expose bugs in VBL tasks. to correct the problem, have your task ; load A5 from the low RAM location "currentA5" immediately upon starting ; up, then restore its caller's A5 before returning. ; ; you should know a few implementation details which may help: ; o copying from a dark area (lots of 1 bits) is slower than from a light ; area. but just barely (a few per cent). ; o there is no way to use this to randomly invert a rectangle. instead, ; copyBits it elsewhere, invert it, and dissBits it back into place. ; o there is also no way to slow the dissolve of a small area. to do this, ; copy a large area in which the only difference is the area to change. ; o if you fade in a solid area, you're likely to see patterns, since the ; random numbers are so cheesy. don't do this; fade in nifty patterns ; which will distract your viewers. ; o very small areas (less than 2 pixels in either dimension) are actually ; done with a call to the real copyBits routine, since the pseudo-random ; sequence generator falls apart under those conditions. ; ; a close relative of this routine is "dissBytes", which (as you might guess) ; copies a byte at a time, which is really fast (the whole screen in .1 or .2 ; seconds). it works only for certain rectangles, though. ; ; sample calling code: ; ------------------- ; ; this is an excerpt from how a prerelease of DarTerminal called this routine. ; note the clever use of "paintbehind". this took about 3 seconds to dissolve ; onto the screen. ; ;var rg: rgnhandle; (* window to copy into *) ; aport: grafptr; (* port to draw into *) ; bits: bitmap; (* new bitmap for that port *) ; r: rect; (* rectangle to draw into *) ; pat: pattern; ; text: packed array[1..37] of char; ; ... ; aport := grafptr(newptr(sizeof(grafport))); (* get a port *) ; openport(aport); (* make it current *) ; ; r := theport^.portbits.bounds; (* start with whole screen *) ; insetrect(r,100,100); (* get window-size rect *) ; (* note that the number of bytes per row must be even! *) ; bits.rowbytes := (((r.right-r.left)+15) div 16) * 2; (* find bytes/row *) ; bits.baseaddr := qdptr(newptr(bits.rowbytes*(r.bottom-r.top))); (* get space *) ; bits.bounds := r; (* boundary rect for bitmap *) ; ; setportbits(bits); (* make new bitmap current *) ; ; eraserect(r); ; textfont(london); textsize(18); textface([bold]); ; text := 'DarTerminal version -1.9 August 1984'; ; textbox(@text,37,r,tejustcenter); ; ; dissbits(bits,screenport^.portbits,r,r); (* dissolve it in *) ; ; repeat until getnextevent(mdownmask+keydownmask,anevent); (* let 'em gawk *) ; ; rg := newrgn; (* get a region to clip with *) ; rectrgn(rg,r); (* as a rectangle *) ; paintbehind(windowpeek(frontwindow),rg); (* update this area of screen *) ; ; disposergn(rg); ; disposptr(ptr(bits.baseaddr)); ; disposptr(ptr(aport)); ; ; ; calling from languages other than pascal: ; ---------------------------------------- ; ; this routine uses the standard Lisa Pascal calling sequence. to convert it to ; most C compilers, you'll probably just have to delete this instruction from ; near the end of the main routine: ; add.l #psize,SP ; unstack parameters ; ; i'd be very interested in hearing about successful uses of this routine from ; other languages. ; ; speed of the dissolve: ; --------------------- ; ; you need to pay attention to this section only if: (a) you want the dissolve ; to run as fast as it can OR (b) you do dissolves of various sizes and want ; them to take proportionate lengths of time. ; ; there are 3 levels of speed; the fastest possible one is chosen for you: ; (1) an ordinary dissolve will work when moving from any bitmap to any bitmap, ; including on the Lisa under MacWorks. this will dissolve at about 49 ; microseconds per pixel. a rectangle one-quarter the size of the screen ; will dissolve in just over two seconds. the speed per pixel will vary ; slightly, and will be less if your rect extents are close to but less ; than powers of 2. ; (2) the dissolve will speed up if both the source and destination bitmaps have ; rowBytes fields which are powers of two. if you're copying to the screen ; on a mac, the rowBytes field already satisfies this. so, make your source ; bitmap the right width for a cheap speedup -- about 20% faster. ; (3) the fanciest level is intended for copying the whole screen. it'll paint ; it in about 3.4 seconds (19 microseconds per pixel). actually, painting ; any rectangle which is the full width of the screen will run at this ; speed, for what that's worth. ; ; duplication and use of this routine: ; ----------------------------------- ; ; this is freeware. you're welcome to copy it and use it in programs. you're ; welcome to modify it, as long as you leave everything up until this section ; unchanged. i'd be very interested in seeing your changes, especially if you ; find ways to make the central loops faster. you're also welcome to port it ; to other machines/languages; i'd appreciate hearing about efforts to do this. ; ; this is "freeware"; please pay me if you use it. why? ; ; o if you have problems using it, i'll help you debug it. ; o i'll send you improved, debugged, faster versions. ; o i'll tell you about other products. this is the first thing i ever ; wrote for the Mac; wouldn't you like to see what else i've produced? ; send me some positive feedback! ; ; how much should you pay? my suggestion is: ; (cost of one copy of the program) * (log10 of number of copies sold) ; if the subroutine is an integral part of your program, double this amount. ; if it's a frill (e.g., you dissolve in your "About MacWhatever"), halve it. ; ; i find it hard to believe that any damages to you or anyone else could come ; from bugs in this routine. but, alas, whether or not you pay me, i can't be ; liable in any way for any problems in it. ; ; send comments, contributions, criticisms, or whatever to: ; mike morton ; INFOCOM ; 125 CambridgePark Dr. ; Cambridge, MA 02140 ; ; if, for some reason, you only have a hard copy of this and would like a ; source on a diskette, please contact: ; MacTutor ; Source Code Disks ; P.O. Box 846 ; Placentia, CA. 92670 ; (714) 579-7700 ; ; ; things to think about: ; --------------------- ; ; o clean up the register usage (as if i'll ever actually get around to this) ; o use a dynamically-built table to avoid the multiply instruction in the general case. ; this may not be a great idea since not everyone can afford that much space. ; o adapt the routine to do transfer modes. especially pattern modes, with no source. ; it'd run significantly faster doing a fill of just black or white! and patXor ; would be pretty cool too (see also the XOR idea below). ; o consider a front-end which does clipping (just like copyBits) by: ; - allocate yet another offscreen bitmap, whose bounds are the intersection of ; the destination rect with the bounding box of the destination's clipping rgn. ; - copy from the destination's bitmap into the temporary bitmap. ; - do a copyBits from the source to the temp, with the requested masking region. ; - dissolve from the temp to the destination, thus actually dissolving that whole ; rect, but changing only the clipped stuff. ; o a nifty way to speed up things would be to XOR the destination into the source [sic!]. ; then the main loop doesn't have to copy a bit; it just tests if the bit is ON in the ; altered source bitmap and, if so, TOGGLES it in the destination. (i.e., it does a ; srcXor operation). to repair the source bitmap, a third XOR from the destination to ; the source should do it [any old-timer will recognize this triple XOR as the best ; way to swap two registers.] the trick is making the invisible XOR run so fast that ; the time to do it is less than the savings in the visible part. or perhaps the batch ; XORs could be done in spare-time... ; o implement a partial dissolve. this would let alex animate star trek-style teleporting. ; there are a lot of ways to do this. the easiest might be to include a counter in the ; loop ("a register! my kingdom for a register!"). alternately, we could assume that ; the list of starting and ending points could be limited, allowing a table-based system. ; this is something worth looking into, but could be real difficult. ; o look at rearranging instructions so CPU-intensive ones aren't grouped, allowing us to ; sneak in between video cycles more often. ; o add some real error-handling. how should we do this? return a status? fault? we ; could also return info to our caller on which of the three loops got used, so they know ; they're getting the speed they want. ; o don't use the BITWIDTH routine to test for exact powers of two in MULCHK. a number X] ; is an exact power of two if (x & -x) equals x. ; ; ; ; -- end of introduction; real stuff starts here -- ; ; ; ; DISSBITS SUBROUTINE ; MDS ASSEMBLER VERSION ; Converted by David E. Smith for MacTutor. ; ; xdef dissBits Include QuickEqu.D ; MDS toolbox equates and traps Include SysEqu.D Include ToolEqu.D Include MacTraps.D MACRO .equ = equ| ; convert Lisa assembler stuff to MDS Mac MACRO _hidecurs = _HideCursor| MACRO _showcurs = _ShowCursor| ; ; definitions of the "ours" record: this structure, of which there are two copies in ; our stack frame, is a sort of bitmap: ; oRows .equ 0 ; (word) number of last row (first is 0) oCols .equ oRows+2 ; (word) number of last column (first is 0) oLbits .equ oCols+2 ; (word) size of left margin within 1st byte oStride .equ oLbits+2 ; (word) stride in memory from row to row oBase .equ oStride+2 ; (long) base address of bitmap osize .equ oBase+4 ; size, in bytes, of "ours" record ; ; stack frame elements: ; srcOurs .equ -osize ; (osize) our view of source bits dstOurs .equ srcOurs-osize ; (osize) our view of target bits sflast .equ dstOurs ; relative address of last s.f. member sfsize .equ -sflast ; size of s.f. for LINK (must be EVEN!) ; ; parameter offsets from the stack frame pointer, A6: ; last parameter is above return address and old s.f. ; dRptr .equ 4+4 ; ^destination rectangle sRptr .equ dRptr+4 ; ^source rectangle dBptr .equ sRptr+4 ; ^destination bitMap sBptr .equ dBptr+4 ; ^source bitMap plast .equ sBptr+4 ; address just past last parameter psize .equ plast-dRptr ; size of parameters, in bytes ; ; entrance: set up a stack frame, save some registers, hide the cursor. ; dissBits: ; main entry point link A6,#-sfsize ; set up a stack frame movem.l D3-D7/A2-A5,-(SP) ; save registers compiler may need _hidecurs ; don't let the cursor show for now ; ; convert the source and destination bitmaps and rectangles to a format we prefer. ; we won't look at these parameters after this. ; move.l sBptr(A6),A0 ; point to source bitMap move.l sRptr(A6),A1 ; and source rectangle lea srcOurs(A6),A2 ; and our source structure bsr CONVERT ; convert to our format move.l dBptr(A6),A0 ; point to destination bitMap move.l dRptr(A6),A1 ; and rectangle lea dstOurs(A6),A2 ; and our structure bsr CONVERT ; convert to our format ; ; check that the rectangles match in size. ; move.w srcOurs+oRows(A6),D0 ; pick up the number of rows cmp.w dstOurs+oRows(A6),D0 ; same number of rows? bne ERROR ; nope -- bag it move.w srcOurs+oCols(A6),D0 ; check the number of columns cmp.w dstOurs+oCols(A6),D0 ; same number of columns, too? bne ERROR ; that's a bozo no-no ; ; figure the bit-width needed to span the columns, and the rows. ; move.w srcOurs+oCols(A6),D0 ; get count of columns ext.l D0 ; make it a longword bsr LOG2 ; figure bit-width move.w D0,D1 ; set aside that result beq SMALL ; too small? wimp out and do it with copyBits move.w srcOurs+oRows(A6),D0 ; get count of rows ext.l D0 ; make it a longword bsr LOG2 ; again, find the bit-width tst.w D0 ; is the result zero? beq SMALL ; if so, our algorithm will screw up ; ; set up various constants we'll need in the in the innermost loop ; move.l #1,D5 ; set up... lsl.l D1,D5 ; ...the bit mask which is... sub.l #1,D5 ; ...bit-width (cols) 1's add.w D1,D0 ; find total bit-width (rows plus columns) lea TABLE,A0 ; point to the table of XOR masks moveq #0,D3 ; clear out D3 before we fill the low byte move.b 0(A0,D0),D3 ; grab the correct XOR mask in D3 ; ; the table is saved compactly, since none of the masks are wider than a byte. we have to unpack it so ; the high-order bit of the D0-bit-wide field is on: ; UNPACK: add.l D3,D3 ; shift left by one bpl.s UNPACK ; keep moving until the top bit that's on is aligned at the top end rol.l D0,D3 ; now swing the top D0 bits around to be the bottom D0 bits, the mask move.l D3,D0 ; 1st sequence element is the mask itself ; ; do all kinds of preparation: ; move.l srcOurs+oBase(A6),D2 ; set up base pointer for our source bits lsl.l #3,D2 ; make it into a bit address move.l D2,A0 ; put it where the fast loop will use it move.w srcOurs+oLbits(A6),D2 ; now pick up source left margin ext.l D2 ; make it a longword add.l D2,A0 ; and make A0 useful for odd routine below move.l dstOurs+oBase(A6),D2 ; set up base pointer for target lsl.l #3,D2 ; again, bit addressing works out faster move.l D2,A1 ; stuff it where we want it for the loop move.w dstOurs+oLbits(A6),D2 ; now pick up destination left margin ext.l D2 ; make it a longword add.l D2,A1 ; and make A1 useful, too move.w srcOurs+oCols(A6),A2 ; pick up the often-used count of columns move.w srcOurs+oRows(A6),D2 ; and of rows add.w #1,D2 ; make row count one-too-high for compares ext.l D2 ; and make it a longword lsl.l D1,D2 ; slide it to line up w/rows part of D0 move.l D2,A4 ; and save that somewhere useful move.w D1,D2 ; put log2(columns) in a safe place (sigh) ; ; try to reduce the amount we shift down D2. this involves: ; halving the strides as long as each is even, decrementing D2 as we go ; masking the bottom bits off D4 when we extract the row count in the loop ; ; alas, we can't always shift as little as we want. for instance, if we don't ; shift down far enough, the row count will be so high as to exceed a halfword, ; and the dread mulu instruction won't work (it eats only word operands). so, ; we have to have an extra check to take us out of the loop early. ; move.w srcOurs+oStride(A6),D4 ; pick up source stride move.w dstOurs+oStride(A6),D7 ; and target stride move.w srcOurs+oRows(A6),D1 ; pick up row count for kludgey check tst.w D2 ; how's the bitcount? beq.s HALFDONE ; skip out if already down to zero HALFLOOP: btst #0,D4 ; is this stride even? bne.s HALFDONE ; nope -- our work here is done btst #0,D7 ; how about this one? bne.s HALFDONE ; have to have both even lsl.w #1,D1 ; can we keep max row number in a halfword? bcs.s HALFDONE ; nope -- D2 mustn't get any smaller! lsr.w #1,D4 ; halve each stride... lsr.w #1,D7 ; ...like this sub.w #1,D2 ; and remember not to shift down as far bne.s HALFLOOP ; loop unless we're down to no shift at all HALFDONE: ; no tacky platitudes, please move.w D4,srcOurs+oStride(A6) ; put back source stride move.w D7,dstOurs+oStride(A6) ; and target stride ; ; make some stuff faster to access -- use the fact that (An) is faster to access ; than d(An). this means we'll misuse our frame pointer, but don't worry -- we'll ; restore it before we use it again. ; move.w srcOurs+oStride(A6),A5 ; make source stride faster to access, too move.l A6,-(SP) ; save framitz pointer move.w dstOurs+oStride(A6),A6 ; pick up destination stride move.l #0,D6 ; we do only AND.W x,D6 -- but ADD.L D6,x clr.w -(SP) ; reserve room for function result bsr MULCHK ; go see if strides are powers of two tst.w (SP)+ ; can we eliminate the horrible MULUs? bne NOMUL ; yes! hurray! ; ; main loop: map the sequence element into rows and columns, check if it's in bounds ; and skip on if it's not, flip the appropriate bit, generate the next element in the ; sequence, and loop if the sequence isn't done. ; ; ; check the row bounds. note that we can check the row before extracting it from ; D0, ignoring the bits at the bottom of D0 for the columns. to get these bits ; to be ignored, we had to make A4 one-too-high before shifting it up to align it. ; LOOP: ; here for another time around cmp.l A4,D0 ; is row in bounds? bge.s NEXT ; no: clip this ; ; map it into the column; check bounds. note that we save this check for second; ; it's a little slower because of the move and mask. ; ; chuck sagely points out that when the "bhi" at the end of the loop takes, we ; know we can ignore the above comparison. thanks, chuck. you're a great guy. ; LOOPROW: ; here when we know the row number is OK move.w D0,D6 ; copy the sequence element and.w D5,D6 ; find just the column number cmp.w A2,D6 ; too far to the right? (past oCols?) bgt.s NEXT ; yes: skip out move.l D0,D4 ; we know element will be used; copy it sub.w D6,D4 ; remove column's bits lsr.l D2,D4 ; shift down to row, NOT right-justified ; ; get the source byte, and bit offset. D4 has the bit offset in rows, and ; D6 is columns. ; move.w A5,D1 ; get the stride per row (in bits) mulu D4,D1 ; stride * row; find source row's offset in bits add.l D6,D1 ; add in column offset (bits) add.l A0,D1 ; plus base of bitmap (bits [sic]) move.b D1,D7 ; save the bottom three bits for the BTST lsr.l #3,D1 ; while we shift down to a word address move.l D1,A3 ; and save that for the test, too not.b D7 ; get right bit number (compute #7-D7) ; ; find the destination bit address and bit offset ; move.w A6,D1 ; extract cunningly hidden destination stride mulu D1,D4 ; stride*row number = dest row's offset in bits add.l D6,D4 ; add in column bit offset add.l A1,D4 ; and base address, also in bits move.b D4,D6 ; set aside the bit displacement lsr.l #3,D4 ; make a byte displacement not.b D6 ; get right bit number (compute #7-D6) btst D7,(A3) ; test the D7th bit of source byte move.l D4,A3 ; point to target byte (don't lose CC from btst) bne.s SETON ; if on, go set destination on bclr D6,(A3) ; else clear destination bit ; ; find the next sequence element. see knuth, vol ii., page 29 for sketchy details. ; NEXT: ; jump here if D0 not in bounds lsr.l #1,D0 ; slide one bit to the right bhi.s LOOPROW ; if no carry out, but not zero, loop eor.l D3,D0 ; flip magic bits appropriate to the bitwidth we want... cmp.l D3,D0 ; ...but has this brought us to square 1? bne.s LOOP ; if not, loop back; else... bra DONE ; ...we're finished SETON: bset D6,(A3) ; source bit was on: set destination on ; copy of above code, stolen for inline speed -- sorry. lsr.l #1,D0 ; slide one bit to the right bhi.s LOOPROW ; if no carry out, but not zero, loop eor.l D3,D0 ; flip magic bits... cmp.l D3,D0 ; ...but has this brought us to square 1? bne.s LOOP ; if not, loop back; else fall through ; ; here when we're done; the (0,0) point has not been done yet. this is ; really the (0,left margin) point. we also jump here from another copy loop. ; DONE: move.l (SP)+,A6 ; restore stack frame pointer move.w srcOurs+oLbits(A6),D0 ; pick up bit offset of left margin move.w dstOurs+oLbits(A6),D1 ; and ditto for target not.b D0 ; flip to number the bits for 68000 not.b D1 ; ditto ; alternate, late entrance, when SCREEN routine has already set up D0 and D1 (it doesn't want the bit ; offset negated). DONEA: ; land here with D0, D1 set move.l srcOurs+oBase(A6),A0 ; set up base pointer for our source bits move.l dstOurs+oBase(A6),A1 ; and pointer for target bset D1,(A1) ; assume source bit was on; set target btst D0,(A0) ; was first bit of source on? bne.s DONE2 ; yes: skip out bclr D1,(A1) ; no: oops! set it right, and fall through ; ; return ; DONE2: ; here when we're really done ERROR: ; we return silently on errors _showcurs ; let's see this again movem.l (SP)+,D3-D7/A2-A5 ; restore lots of registers unlk A6 ; restore caller's stack frame pointer move.l (SP)+,A0 ; pop return address add.l #psize,SP ; unstack parameters jmp (A0) ; home to mother ; ; ----------------------------------------------------------------------------------- ; ; sleazo code for when we're asked to dissolve very small regions. if either dimension of the rectangle ; is too small, we bag it and just delegate the problem to copyBits. a possible problem with this is if ; someone decides to substitute us for the standard copyBits routine -- this case will become recursive... ; SMALL: ; here when it's too small to copy ourselves move.l sBptr(A6),-(SP) ; push args: source bitmap move.l dBptr(A6),-(SP) ; destination bitmap move.l sRptr(A6),-(SP) ; source rectangle move.l dRptr(A6),-(SP) ; destination rectangle move.w #srcCopy,-(SP) ; transfer mode -- source copy clr.l -(SP) ; mask region -- NIL _copyBits ; do the copy in quickdraw-land bra DONE2 ; head for home ; ; ----------------------------------------------------------------------------------- ; ; code identical to the usual loop, but A5 and A6 have been changed to shift counts. ; other than that, it's the same. really it is! well, no, wait a minute... ; because we don't have to worry about the word-size mulu operands, we can collapse ; the shifts and countershifts further as shown below: NOMUL: ; here for alternate version of loop tst.w D2 ; is right shift zero? beq.s NOMUL2 ; yes: can't do much more... cmp.w #0,A5 ; how about one left shift (for source stride)? beq.s NOMUL2 ; yes: ditto cmp.w #0,A6 ; and the other left shift (destination stride)? beq.s NOMUL2 ; yes: can't do much more... sub.w #1,D2 ; all three... sub.w #1,A5 ; ...are... sub.w #1,A6 ; ...collapsible bra.s NOMUL ; go see if we can go further ; ; see if we can do the super-special-case loop, which basically is equivalent to any rectangle ; where the source and destination are both exactly the width of the Mac screen. ; NOMUL2: ; here when D2, A5, and A6 are all collapsed tst.w D2 ; did this shift get down to zero? bne.s NLOOP ; no: skip to first kludged loop cmp.w #0,A5 ; is this zero? bne.s NLOOP ; no: again, can't make further optimization cmp.w #0,A6 ; how about this? bne.s NLOOP ; no: the best-laid plans of mice and men... cmp.w A2,D5 ; is there no check on the column? bne.s NLOOP ; not a power-of-two columns; rats! move.w A0,D6 ; grab the base address of the source and.b #7,D6 ; select the low three bits bne.s NLOOP ; doesn't sit on a byte boundary; phooey move.w A1,D6 ; now try the base of the destination and.b #7,D6 ; and select its bit offset beq.s SCREEN ; yes! do extra-special loop! ; ; fast, but not super-fast loop, used when both source and destination bitmaps have strides which are ; powers of two. ; NLOOP: ; here for another time around cmp.l A4,D0 ; is row in bounds? bge.s NNEXT ; no: clip this NLOOPROW: ; here when we know the row number is OK move.w D0,D6 ; copy the sequence element and.w D5,D6 ; find just the column number cmp.w A2,D6 ; too far to the right? (past oCols?) bgt.s NNEXT ; yes: skip out move.l D0,D4 ; we know element will be used; copy it sub.w D6,D4 ; remove column's bits lsr.l D2,D4 ; shift down to row, NOT right-justified move.w A5,D7 ; get log2 of stride per row (in bits) move.l D4,D1 ; make a working copy of the row number lsl.l D7,D1 ; * stride/row is source row's offset in bits add.l D6,D1 ; add in column offset (bits) add.l A0,D1 ; plus base of bitmap (bits [sic]) move.b D1,D7 ; save the bottom three bits for the BTST lsr.l #3,D1 ; while we shift down to a byte address move.l D1,A3 ; and save that for the test, too not.b D7 ; get right bit number (compute #7-D7) move.w A6,D1 ; extract log2 of destination stride lsl.l D1,D4 ; stride*row number = dest row's offset in bits add.l D6,D4 ; add in column bit offset add.l A1,D4 ; and base address, also in bits move.b D4,D6 ; set aside the bit displacement lsr.l #3,D4 ; make a byte displacement not.b D6 ; get right bit number (compute #7-D6) btst D7,(A3) ; test the D7th bit of source byte move.l D4,A3 ; point to target byte (don't ruin CC from btst) bne.s NSETON ; if on, go set destination on bclr D6,(A3) ; else clear destination bit NNEXT: ; jump here if D0 not in bounds lsr.l #1,D0 ; slide one bit to the right bhi.s NLOOPROW ; if no carry out, but not zero, loop eor.l D3,D0 ; flip magic bits... cmp.l D3,D0 ; ...but has this brought us to square 1? bne.s NLOOP ; if not, loop back; else... bra DONE ; ...we're finished NSETON: bset D6,(A3) ; source bit was on: set destination on lsr.l #1,D0 ; slide one bit to the right bhi.s NLOOPROW ; if no carry out, but not zero, loop eor.l D3,D0 ; flip magic bits... cmp.l D3,D0 ; ...but has this brought us to square 1? bne.s NLOOP ; if not, loop back; else fall through bra DONE ; and finish ; ; ----------------------------------------------------------------------------------- ; ; super-special case, which happens to hold for the whole mac screen -- or subsets ; of it which are as wide as the screen. here, we've found that the shift counts ; in D2, A5, and A6 can all be collapsed to zero. and D5 equals A2, so there's ; no need to check whether D6 is in limits -- or even take it out of D0! so, this loop ; is the NLOOP code without the shifts or the check on the column number. should ; run like a bat; have you ever seen a bat run? ; ; oh, yes, one further restriction -- the addresses in A0 and A1 must point to ; integral byte addresses with no bit offset. (this still holds for full-screen ; copies.) because both the source and destination are byte-aligned, we can skip ; the ritual Negation Of The Bit Offset which the 68000 usually demands. SCREEN: ; here to set up to do the whole screen, or at least its width move.l A0,D6 ; take the base source address... lsr.l #3,D6 ; ... and make it a byte address move.l D6,A0 ; replace pointer move.l A1,D6 ; now do the same... lsr.l #3,D6 ; ...for... move.l D6,A1 ; ...the destination address bra.s N2LOOP ; jump into loop N2HEAD: ; here when we shifted and a bit carried out eor.l D3,D0 ; flip magic bits to make the sequence work N2LOOP: ; here for another time around cmp.l A4,D0 ; is row in bounds? bge.s N2NEXT ; no: clip this N2LOOPROW: ; here when we know the row number is OK move.l D0,D1 ; copy row number, shifted up, plus column offset lsr.l #3,D1 ; while we shift down to a word offset btst D0,0(A0,D1) ; test bit of source byte bne.s N2SETON ; if on, go set destination on bclr D0,0(A1,D1) ; else clear destination bit N2NEXT: ; jump here if D0 not in bounds lsr.l #1,D0 ; slide one bit to the right bhi.s N2LOOPROW ; if no carry out, but not zero, loop bne.s N2HEAD ; if carry out, but not zero, loop earlier bra.s N2DONE ; 0 means next sequence element would have been D3 N2SETON: bset D0,0(A1,D1) ; source bit was on: set destination on lsr.l #1,D0 ; slide one bit to the right bhi.s N2LOOPROW ; if no carry out, but not zero, loop bne.s N2HEAD ; if carry out, but not zero, loop earlier ; zero means the loop has closed on itself ; ; because our bit-numbering isn't like that of the other two loops, we set up D0 and D1 ; ourselves before joining a bit late with the common code to get the last bit. ; N2DONE: move.l (SP)+,A6 ; restore the stack frame pointer move.w srcOurs+oLbits(A6),D0 ; pick up bit offset of left margin move.w dstOurs+oLbits(A6),D1 ; and ditto for target bra DONEA ; go do the first bit, which the sequence doesn't cover ; ; ----------------------------------------------------------------------------------- ; ; mulchk -- see if we can do without multiply instructions. ; ; calling sequence: ; A5 holds the source stride ; A6 holds the destination stride ; clr.w -(SP) ; reserve room for boolean function return ; bsr MULCHK ; go check things out ; tst.w (SP)+ ; test result ; bne.s SHIFT ; if non-zero, we can shift and not multiply ; ; (if we can shift, A5 and A6 have been turned into shift counts) ; ; registers used: none (A5, A6) MULCHK: movem.l D0-D3,-(SP) ; stack caller's registers move.l A5,D0 ; take the source stride bsr BITWIDTH ; take log base 2 move.l #1,D1 ; pick up a one... lsl.l D0,D1 ; ...and try to recreate the stride cmp.l A5,D1 ; does it come out the same? bne.s NOMULCHK ; nope -- bag it move.w D0,D3 ; save magic logarithm of source stride move.l A6,D0 ; yes -- now how about destination stride? bsr BITWIDTH ; convert that one, also move.l #1,D1 ; again, try a single bit... lsl.l D0,D1 ; ...and see if original # was 1 bit cmp.l A6,D1 ; how'd it come out? bne.s NOMULCHK ; doesn't match -- bag this ; ; we can shift instead of multiplying. change address registers & tell our caller. ; move.w D3,A5 ; set up shift for source stride move.w D0,A6 ; and for destination stride st 4+16(SP) ; tell our caller what's what bra.s MULRET ; and return NOMULCHK: sf 4+16(SP) ; tell caller we can't optimize MULRET: ; here to return; result set movem.l (SP)+,D0-D3 ; pop some registers rts ; all set ; ; ----------------------------------------------------------------------------------- ; ; table of (longword) masks to XOR in strange Knuthian algorithm. the first table ; entry is for a bit-width of two, so the table actually starts two bytes before ; that. hardware jocks among you may recognize this scheme as the software analog ; of a "maximum-length sequence generator". ; ; to save a bit of room, masks are packed in bytes, but should be aligned as ; described in the code before being used. ; .ALIGN 2 table: DC.B 0,0 ; first element is #2 DC.B 3 ; 2 DC.B 3 ; 3 DC.B 3 ; 4 DC.B 5 ; 5 DC.B 3 ; 6 DC.B 3 ; 7 DC.B 23 ; 8 DC.B 17 ; 9 DC.B 9 ; 10 DC.B 5 ; 11 DC.B 101 ; 12 DC.B 27 ; 13 DC.B 53 ; 14 DC.B 3 ; 15 DC.B 45 ; 16 DC.B 9 ; 17 DC.B 129 ; 18 DC.B 57 ; 19 DC.B 9 ; 20 DC.B 5 ; 21 DC.B 3 ; 22 DC.B 33 ; 23 DC.B 27 ; 24 DC.B 9 ; 25 DC.B 113 ; 26 DC.B 57 ; 27 DC.B 9 ; 28 DC.B 5 ; 29 DC.B 101 ; 30 DC.B 9 ; 31 DC.B 163 ; 32 .align 2 ; ; ----------------------------------------------------------------------------------- ; ; convert -- convert a parameter bitMap and rectangle to our internal form. ; ; calling sequence: ; lea bitMap,A0 ; point to the bitmap ; lea rect,A1 ; and the rectangle inside it ; lea ours,A2 ; and our data structure ; bsr CONVERT ; call us ; ; when done, all fields of the "ours" structure are filled in: ; oBase is the address of the first byte in which any bits are to be changed ; oLbits is the number of bits into that first byte which are ignored ; oStride is the stride from one row to the next, in bits ; oCols is the number of columns in the rectangle ; oRows is the number of rows ; ; registers used: D0, D1, D2 ; CONVERT: ; ; save the starting word and bit address of the stuff: ; move.w top(A1),D0 ; pick up top of inner rectangle sub.w bounds+top(A0),D0 ; figure rows to skip within bitmap mulu rowbytes(A0),D0 ; compute bytes to skip (relative offset) add.l baseaddr(A0),D0 ; find absolute address of first row to use move.w left(A1),D1 ; pick up left coordinate of inner rect sub.w bounds+left(A0),D1 ; find columns to skip move.w D1,D2 ; copy that and.w #7,D2 ; compute bits to skip in first byte move.w D2,oLbits(A2) ; save that in the structure lsr.w #3,D1 ; convert column count from bits to bytes ext.l D1 ; convert to a long value, so we can... add.l D1,D0 ; add to row start in bitmap to find 1st byte move.l D0,oBase(A2) ; save that in the structure ; ; save the stride of the bitmap; this is the same as for the original, but in bits. ; move.w rowbytes(A0),D0 ; pick up the stride lsl.w #3,D0 ; multiply by eight to get a bit stride move.w D0,oStride(A2) ; stick it in the target structure ; ; save the number of rows and columns. ; move.w bottom(A1),D0 ; get the bottom of the rectangle sub.w top(A1),D0 ; less the top coordinate sub.w #1,D0 ; get number of highest row (1st is zero) bmi.s CERROR ; nothing to do? (note: 0 IS ok) move.w D0,oRows(A2); ; save that in the structure move.w right(A1),D0 ; get the right edge of the rectangle sub.w left(A1),D0 ; less the left coordinate sub.w #1,D0 ; make it zero-based bmi CERROR ; nothing to do here? move.w D0,oCols(A2) ; save that in the structure ; ; all done. return. ; rts ; ; error found in CONVERT. pop return and jump to the error routine, such as it is. ; CERROR: addq.l #4,SP ; pop four bytes of return address. bra ERROR ; return silently ; ; ----------------------------------------------------------------------------------- ; ; log2 -- find the ceiling of the log, base 2, of a number. ; bitwidth -- find how many bits wide a number is ; ; calling sequence: ; move.l N,D0 ; store the number in D0 ; bsr LOG2 ; call us ; move.w D0,... ; D0 contains the word result ; ; registers used: D2, (D0) ; BITWIDTH: sub.l #1,D0 ; so 2**n works right (sigh) LOG2: tst.l D0 ; did they pass us a zero? beq.s LOGDONE ; call log2(0) zero -- what the heck... beq.s LOGDONE ; if D0 was one, answer is zero move.w #32,D2 ; initialize count LOG2LP: lsl.l #1,D0 ; slide bits to the left by one dbcs D2,LOG2LP ; decrement and loop until a bit falls off move.w D2,D0 ; else save our value where we promised it LOGDONE: ; here with final value in D0 rts ; and return END ; procedure dissBits